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Classification 

Email: Spam / Not Spam? 

Tumor: Malignant / Benign ? 

y e {0, 1} is binary classification problem. 

o: “Negative Class” (e.g., benign tumor) 

1 : “Positive Class” (e.g., malignant tumor) 

y is the predicted variable 

y e \0, 1,2,3] is a multiclass classification problem. 



Threshold classifier output he (. x ) at 0 . 5 : 

If ho(x) > 0.5 , predict “y = 1 ” 

If he(x) < 0.5 , predict “y = o” 

It looks like linear regression is actually doing something reasonable even 
though this is a classification task (good fitting to the data set) 


Classification: y = o or 1 
ho (x) in linear regression can be > 1 or < o 

Using linear regression, the hypothesis can output values much 
larger than one or less than zero, even if all of good training 
examples have labels y equals zero or one. 

Logistic Regression: 0 < h$(x) < 1 

Logistic Regression = Classification 



Logistic Regression 

Hypothesis 

Representation 
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Want 0 < he{x) < 1 

he(x) = g{ 6 T x ) 



Sigmoid function 
Logistic function 


Linear Regression Model 

ho(x) = 6 t x 



atz — >oo g(z)=i 

atz — >-oo g(z)=o 


nterpretation of Hypothesis Output 

he ( x ) = estimated probability that y = 1 on input x 

Example: If x — 


x 0 


1 

X\ 


tumorSize 


hg(x) = 0.7 

Tell patient that 70 % chance of tumor being malignant 


“probability that y = 1 , given x, 
he( x )=P (y =1 l x > 0 ) parameterized by 0” 

P(y = 0|x; 0) + P(y = l|x; 9) = 1 
P(y — 0|x; 9) — 1 — P(y — l\x\ 9) 


y=o or 1 then 



Logistic Regression 
Decision boundary 
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Logistic regression 

he(x) = g(0 T x ) 9i z ) = i+e-* 

g(z) > 0.5 . . . when. . . z > 0 . . .then . . . 
g(0 T x) = h 6 (x) > 0.5. . . whenever. ..0 T x > 0 

predict "y = 1" if > 0.5 

g(z ) < 0.5 . . .when. . . z < 0 ...then. . . 

T T 

g(0 x) = h 0 (x) < 0.5. . . whenever. . .6 x<0 

predict "y = 0" if he{x) < 0.5 





hg(x) = g{6 0 + Q x x\ + 0 2 x 2 ) 


x 2 + x 2 = 3 is the decision boundary where h e (x) = 0.5 

Predict “y = 1" if — 3 + x\ + x 2 > 0 


j = 1. . .when. . .g(0 T x) = hAx) > 0.5 =^> 0 T x > 0 


This means for any example of features x T and x,that satisfy this equation 

— 3 + x\ + x 2 >0 y=i 


on-linear decision boundaries 




Polynomial Regression: add extra higher 
order polynomial features for the hypothesis 

1 Hq(x) = g{0o + 6\X\ + 62X2 

-\-6%x\ + $ 4 ^ 2 ) 

Predict “y = 1" if — 1 + x\ + x\ > 0 

}iq{x) = g{0o + 6\X\ + 62X2 + 0%x\ 

-\~0 4.x“^X2 OfjX^X^) 0qX^X2 d - . . . ) 
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Logistic Regression 

Cost function 



Training set: {(x (1) ,y (1) ),(x (2) ,y (2) ),-- 

m examples 


x € 


he{x) = 


x 0 

X\ 


X 


n 


X 0 = 


1 + e 


— 6 T 


x 


(x (rn \y {m) )} 

i,ye{0,i} 


How to choose parameters Q given this training set? 


Cost function 

Linear regression: 

Logistic regression: 
Logistic regression: 


m 9 

•W = £ E i (M* (i) ) - » (i) ) 

i = 1 

Cost (he(x^) 1 y^) = \ (ho{x^) — yW) 2 

rri 

= m'lZ Cost 



0 


Logistic regression cost function 


Cost (he(x),y) 


-\og(h e (x)) 

log(l - h e (x )) 


if y = 1 

if V = 0 



Cost = 0 if y = 1, h$ {pc) = 1 
But as 0 

Cost — ?► oo 

Captures intuition that if h${x) = 0, 
(predict P{y = l|x; 0) = 0), but y = 1, 
we’ll penalize learning algorithm by a very 
large cost. 


o 



Cost (he(x),y) 


-log (ho(x)) if y = 1 
-log(l - h e (x)) if y = 0 



O 




Logistic Regression 

nR^fied cost function 
and gradient descent 
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Logistic regression cost function 

m 

j(0) = i E Cost (h e (x^),y^) 

cost(M^) = { _ log (l°!tS! E; 


Note: y = 0 or 1 always 

Simplified cost function 

Cost(h g (x), y) = -y log(h e (x))-(i-y)log(i-h e (x)) 


Logistic regression cost function 

m 

j(0) = ^ E Cost {ho(xW),yW) 


m 


= y {l) log ho(x^) + (1 — yC)) log (l - h e (x^)) 

i = 1 

To fit parameters 0 : 

min J(0) 


To make a prediction given new x : 
Output ho(x) — 


l+e 


-6> t cc 


radient Descent 

m 

J(0) = y {l) log he(x^) + (1 - yl*!) log (1 - he(x^)) 

i = 1 

Want minJ(0) : 

o 

Repeat { 

», ■■= »s - <*sf- m 

'i (simultaneously update all 


radient Descent 

m 

J(Q) = “mtS 2/ W log he(x^) + (1 - ?/ w )log(l - ho(x^)) 

2=1 

Want ming «/($) : 

Repeat { 

nn 

Oj := Oj - a J2(he(x^) - yW)^. 

i = 1 

i (simultaneously update all 0 , ) 


Algorithm looks identical to linear regression! 



Logistic Regression 

Multi-class 

IlffiSatSon: One-vs-all 
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Multiclass classification 

Email foldering/tagging: Work, Friends, Family, 
Hobby 

Medical diagrams: Not ill, Cold, Flu 
Weather: Sunny, Cloudy, Rain, Snow 


Binary classification: 



Multi-class classification: 




X ! 

Class 1: A 
Class 2: 
Class 3: X 

h e\ x ) = p (y 




(* = 1 , 2 , 3 ) 



hp>(x) 


One-vs-all 

Train a logistic regression classifier h# ( x ) for each 
class i to predict the probability that y = i . 

On a new input x , to make a prediction, pick 
the class i that maximizes 

max (x) 

i 




